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Abstract 



We define a class of probabilistic models in terms of an operator algebra of stochastic processes, and 
a representation for this class in terms of stochastic parameterized grammars. A syntactic specifica- 
tion of a grammar is mapped to semantics given in terms of a ring of operators, so that grammatical 
composition corresponds to operator addition or multiplication. The operators are generators for 
the time-evolution of stochastic processes. Within this modeling framework one can express data 
clustering models, logic programs, ordinary and stochastic differential equations, graph grammars, 
and stochastic chemical reaction kinetics. This mathematical formulation connects these apparently 
distant fields to one another and to mathematical methods from quantum field theory and operator 
algebra. 

1 Introduction 

Probabilistic models of application domains are central to pattern recognition, machine learning, and scientific mod- 
eling in various fields. Consequently, unifying frameworks are likely to be fruitful for one or more of these fields. 
There are also more technical motivations for pursuing the unification of diverse model types. In multiscale modeling, 
models of the same system at different scales can have fundamentally different characteristics (e.g. deterministic vs. 
stochastic) and yet must be placed in a single modeling framework. In machine learning, automated search over a 
wide variety of model types may be of great advantage. In this paper we propose Stochastic Parameterized Grammars 
(SPG's) and their generalization to Dynamical Grammars (DG's) as such a unifying framework. To this end we define 
mathematically both the syntax and the semantics of this formal modeling language. 

The essential idea is that there is a "pool" of fully specified parameter-bearing terms such as { bacterium(x), 
macrophage (y), redbloodcell(z)} where x,y and z might be position vectors. A grammar can include rules such 
as 

{bacterium(x) , macrophage (y)} —> macrophage(y) with p(\\x — y\\) 

which specify the probability per unit time, p, that the macrophage ingests and destroys the bacterium as a function of 
the distance || x — y\\ between their centers. Sets of such rules are a natural way to specify many processes. We will 
map such grammars to stochastic processes in both continuous time (Section EOt and discrete time (Section!^}, and 
relate the two definitions (Section lX5l l. A key feature of the semantics maps is that they are naturally defined in terms 
of an algebraic ring of time evolution operators: they map operator addition and multiplication into independent or 
strongly dependent compositions of stochastic processes, respectively. 

The stochastic process semantics defined here is a mathematical, algebraic object. It is independent of any particular 
simulation algorithm, though we will discuss (Section l3T4t a powerful technique for generating simulation algorithms, 
and we will demonstrate ("Section [4. 21 the interpretation of certain subclasses of SPG's as a logic programming lan- 
guage. Other applications that will be demonstrated are to data clustering (| 1 1), chemical reaction kinetics (Section 



14. 11 . graph grammars and string grammars (Section |4.3> . systems of ordinary differential equations and systems of 
stochastic differential equations (Section [4.4l Other frameworks that describe model classes that may overlap with 
those described here are numerous and include: branching or birth-and-death processes, marked point processes, 
MGS modeling language using topological cell complexes, interacting particle systems, the BLOG probabilistic ob- 
ject model, adaptive mesh refinement with rewrite rules, stochastic pi-calculus, and colored Petri Nets. The mapping 
^ c /d to an operator algebra of stochastic processes, however, appears to be novel. 

The present paper is an abbreviated summary of 1 1 1. 

2 Syntax Definition 

Consider the rewrite rule 

A x (zi), A 2 {x 2 ), ...,A n (x n ) -» Bi(yi), B 2 (y 2 ), ■-, B m (y m ) with p({x t } , {yj}) (1) 

where the A^ and Bi denote symbols r a chosen from an arbitrary alphabet set T = {r a \a E .4} of "types". In addition 
these type symbols carry expressions for parameters x.- L or yj chosen from a base language £p(i) defined below. The 
A's can appear in any order, as can the _B's. Different A's and B's appearing in the rule can denote the same alphabet 
symbol r a , with equal or unequal parameter values xi or yj . p is a nonnegative function, assumed to be denoted by an 
expression in a base language C p defined below, and also assumed to be an element of a vector space T of real-valued 
functions. Informally, p is interpreted as a nonnegative probability rate: the independent probability per unit time that 
any possible instantiation of the rule will "fire" if its left hand side precondition remains continuously satisfied for a 
small time. This interpretation will be formalized in the semantics. 

We now define £p(i). Each term Ai(xi) or Bj(jjj) is of type r a and its parameters xi take values in an associated 
(ordered) Cartesian product set V a of d a factor spaces chosen (possibly with repetition) from a set of base spaces 
T> = {D \b E B}. Each Db is a measure space with measure Particular D\, may for example be isomorphic to the 
integers Z with counting measure, or the real numbers R with Lebesgue measure. The ordered choice of spaces Db 

in V a = II Db =rJ (ak) constitutes the type signature {a a k S £>|1 ^ k ^ d a } of type r a . (As an aside, polymorphic 

k=l 

argument type signatures are supported by defining a derived type signature {<r a kb — {Db Q D a ( a k)) £ {T, F}\1 ^ 
k ^ d a .b E B}. For example we can regard Z as a subset of R.) Correspondingly, parameter expressions Xi are tuples 
of length d a , such that each component x^ is either a constant in the space D b=IT ( a k), or a variable X c (c E C) that is 
restricted to taking values in that same space Du c \. The variables that appear in a rule this way may be repeated any 
number of times in parameter expressions X{ or yj within a rule, providing only that all components xn, take values 
in the same space D b=(7 (ak) ■ A substitution 6 : c i— > D b ( c ) of values for variables X c assigns the same value to all 
appearances of each variable X c within a rule. Hence each parameter expression Xi takes values in a fixed tuple space 
V a under any substitution 8. This defines the language Cp(i). 

We now constrain the language Cp. Each nonnegative function p((xi), (yj)) is a probability rate: the independent 
probability per unit time that any particular instantiation of the rule will fire, assuming its precondition remains con- 
tinuously satisfied for a small interval of time. It is a function only of the parameter values denoted by (xi) and 
(yj), and not of time. Each p is denoted by an expression in a base language Cp that is closed under addition and 
multiplication and contains a countable field of constants, dense in R, such as the rationals or the algebraic numbers. 
p is assumed to be a nonnegative-valued function in a Banach space T(V) of real-valued functions defined on the 
Cartesian product space V of all the value spaces V a (i) of the terms appearing in the rule, taken in a standardized order 
such as nondeccreasing order of type index a on the left hand side followed by nondecreasing order of type index a 
on the right hand side of the rule. Provided Cp is expressive enough, it is possible to factor p r ((xi), (yj)) within Cp 
as a product p r =pP ul ' c ((xi))Pr r ((yj)\(xi)) of a conditional distribution on output parameters given input parameters 
Pi r ((yj)\(xi)) and a total probability rate pP urc ((xi)) as a function of input parameters only. 

With these definitions we can use a more compact notation by eliminating the A's and B's, which denote types, in 
favor of the types themselves. (The expression rj(xj) is called a parameterized term, which can match to a parameter- 
bearing object or term instance in a "pool" of such objects.) The caveat is that a particular type t$ may appear any 
finite number of times, and indeed a particular parameterized term Ti(xi) may appear any finite number of times. So 
we use multisets {...r a ^(xi)...}^ (in which the same object r a ^(xi) may appear as the value of several different 
indices i) for both the LHS and RHS (Left Hand Side and Right Hand Side) of a rule: 

{T a (i)(xi)\i 6li}, ->• {r a '( )(y 3 )\i ^1r}„ with p r ((x t ),(%)) (2) 
Here the same object r a (i) (xi) may appear as the value of several different indices i under the mappings i \— > (a(i),Xi) 



and/or i i— > (a'(i),yi). Finally we introduce the shorthand notation r s ; = T a u\ and t'j = T a iu\, and revert to the 
standard notation { } for multisets; then we may write {r^ — > {t'j (yj ) } with p,- ((xi), (yj ) ) . 

In addition to the with clause of a rule following the LHS— >RHS header, several other alternative clauses can be 
used and have translations into with clauses. For example, "subject to f(x, y)" is translated into "with 5(f(x, y))" 
where 6 is an appropriate Dirac or Kronecker delta function that enforces a contraint f(x,y) = 0. Other examples 
are given in 1 1 1. The translation of "solving e" or "solve e" will be defined in terms of with clauses in Section |4~41 
As a matter of definition, Stochastic Parameterized Grammars do not contain solving/solve clauses, but Dynamical 
Grammars may include them. There exists a preliminary implementation of an interpreter for most of this syntax in 
the form of a Mathematica notebook, which draws samples according to the semantics of Section[3]below. 

A Stochastic Parameterized Grammar (SPG) V consists of (minimally) a collection of such rules with common type 
set T, base space set T>, type signature specification a, and probability rate language Lb,- After defining the semantics 
of such grammars, it will be possible to define semantically equivalent classes of SPG's that are untyped or that have 
richer argument languages £p(i). 

3 Semantic Maps 

We provide a semantics function ^(r) in terms of an operator algebra that results in a stochastic process, if it 
exists, or a special "undefined" element if the stochastic process doesn't exist. The stochastic process is defined by 
a very high-dimensional differential equation (the Master Equation) for the evolution of a probability distribution in 
continuous time. On the other hand we will also provide a semantics function 4 , ( j(r) that results in a discrete-time 
stochastic process for the same grammar, in the form of an operator that evolves the probability distribution forward 
by one discrete rule-firing event. In each case the stochastic process specifies the time evolution of a probability 
distribution over the contents of a "pool" of grounded parameterized terms r a (x a ) that can each be present in the pool 
with any allowed multiplicity from zero to n™ ax . We will relate these two alternative "meanings" of an SPG, ^c(r) 
in continuous time and ^(r) in discrete time. 

A state of the "pool of term instances" is defined as an integer- valued function n: the "copy number" n a (x a ) £ 
{0, 1, 2, ...} of parameterized terms T a (x a ) that are grounded (have no variable symbols X c ), for any combination 
(a, x a ) G V = ]J a <S> V a of type index a E A and parameter value x a £ V a . We denote this state by the "indexed 

set" notation for such functions, {n a (x)}. Each type r a may be assigned a maximum value ni max ^ for all n a (x a ), 
commonly oo (no constraint on copy numbers) or 1 (so n a (x a ) € {0, 1} which means each term-value combination is 
simply present or absent). The state of the full system at time t is defined as a probability distribution on all possible 
values of this (already large) pool state: Pr({n a (x a )\(a,x a ) 6 V};t) = Pi({n a (x a )};t). The probability distribution 
that puts all probability density on a particular pool state {n a (x a )} is denoted |{n a (a; a )}). 

For continuous-time we define the semantics 5 , c (r) of our grammar as the solution, if it exists, of the Master Equation 
dPr(t)/dt = H ■ Pr(t), which can be written out as: 

^Pr(KW};«) = H {n}{m} Pr({m a (x)};t) (3) 

{rn a (x)} 

and which has the formal solution Pr(t) = exp(tH) • Pr(0). 

For discrete-time semantics (T) there is an linear map H which evolves unnormalized probabilities forward by one 
rule-firing time step. The probabilities must of course be normalized, so that after s discrete time steps the probability 

is: 

Pr(s) = c n H s ■ Pr(0) = (h s ■ Pr(0)) / (l ■ H s ■ Pr(0)) (4) 

which, taken over all s ^ and Pr({n a (a;)}; 0), defines ^(r). In both cases the long-time evolution of the system 
may converge to a limiting distribution W*(r) • Pr(0) = lim^oo Pr({n a (a;)}; t) which is a key feature of the seman- 
tics, but we do not define the semantics ^ c / d (T) as being only this limit even if it exists. Thus semantics-preserving 
transformations of grammars are fixedpoint-preserving transformations of grammars but the converse may not be true. 

The Master Equation is completely determined by the generators H and H which in turn are simply composed from 
elementary operators acting on the space of such probability distributions. They are elements of the operator polyno- 
mial ring R[{£? Q }] defined over a set of basis operators {B a } in terms of operator addition, scalar multiplication, and 
noncommutative operator multiplication. These basis operators {B a } provide elementary manipulations of the copy 
numbers n a (x). 



3.1 Operator algebra 
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The simplest basis operators {B a } are elementary creation operators {a a (x)\a £ A A x G V a } and annihilation 
operators {a Q (x)|a £ .4 A x S V a } that increase or decrease each copy number n a (x) in a particular way (reviewed 
in (2|): 

a a {x)\ {n b (y)}) = \ {n b (y) + S K (a,b)S K {x,y)}) (5) 
a a {x)\ {n b (y)}) = n a (x)\ {n b (y) - 5 K (a,b)S K (x,y)}) (6) 

where Sk(x, y)is the Kronecker delta function. These two operator types then generate N a (x) = a a (x)a a (x): 

N a (x)\ {n b (y)}) = a a (x)a a (x)\ {n b (y)}) = n a (x)\ {n b (y)}) . 

We can write these operators a. a as finite or infinite dimensional matrices depending on the maximum copy number 
n ™ for type r a . If ni max '=l (for a fermionic term), and we omit the type which are all assumed equal below, then 


x 1 

Likewise if Tia aax ^=oo (for a bosonic term), a = S n _ m +i and a = m<5„+i. m . By truncating this matrix to finite 
size n( max > < oo we may compute that for some polynomial <3(7V|n( max ') of degree n^ max '-l in N with rational 
coefficients, 

[a(x),a(y)] = S(x -y)[I+ NQ(N\n^)] 

where 5 is the Dirac delta (generalized) function appropriate to the (product) measure /i on the relevant value space V. 
Eg. if n ( max )=l then Q = -2; if n( max )=oo then Q = 0. 

3.2 Continuous-time semantics 

For a grammar rule number "r" of the form of (Equation[5} we define the operator that first (instantaneously) destroys 
all parameterized terms on the LHS and then (immediately and instantaneously) creates all parameterized terms on the 
RHS. This happens independently of time or other terms in the pool. Assuming that the parameter expressions x, y 
contain no variables X r , the effect of this event is: 



O r = p r {{xi) , (yj)) 

If there are variables {X c }, we must sum or integrate over all their possible values in (g) D b ^ c y. 



iErhs(V) 



n a m(yj) 

ielhs(r) 



(7) 



Or = 



6(1) 



n^(c)pQ p r (( Xi ({x c })),( yj ({x c }))) 



iGrhs(r) 



n a bU) ( yj ({x c })) 

jelhs(r) 



(8) 



Thus, syntactic variable-binding has the semantics of multiple integration. A "monotonic rule" has all its LHS terms 
appear also on the RHS, so that nothing is destroyed. Unfortunately O r doesn't conserve probability because probabil- 
ity inflow to new states (described by O r ) must be balanced by outflow from current state (diagonal matrix elements). 
The following operator conserves probability: O r = O r — diag(l T • O r ). 

For the entire grammar the time evolution operator is simply a sum of the generators for each rule: 

H = °r = ° r ~ dia 8'( lT -Or) = H — D (9) 



This superposition implements the basic principle that every possible rule firing is an exponential process, all hap- 
pening in parallel until a firing occurs. Note that (Equation 0, (Equation [8} and H = ^2,O r are encompassed by 



the polynomial ring K[{i? Q }] where the basis operators include all creation and annihilation operators. Ring addition 
(as in Equation [5] or Equation [8) corresponds to independently firing processes; ring operator multiplication (as in 
Equation[7} corresponds to obligatory event co-ocurrence of the constituent events that define a process, in immediate 
succession, and nonnegative scalar multiplication corresponds to speeding up or slowing down a process. Commuta- 
tion relations between operators describe the exact extent to which the order of event occurrence matters. 



3.3 Discrete-time SPG semantics 

The operator H describes the flow of probability per unit time, over an infinitesimal time interval, into new states 
resulting from a single rule-firing of any type. If we condition the probability distribution on a single rule having fired, 
setting aside the probability weight for all other possibilities, the normalized distribution is c\H ■ po = (H ■ po)/(l ■ 
H -po) . Iterating, the state of the discrete-time grammar after s rule firing steps is ^ as given by (Equation!?}, where 
H = ^ O r as before. The normalization can be state-dependent and hence dependent on s, so c s ^ c s . This is a 

r 

critical distinction between stochastic grammar and Markov chain models, for which c s = c s . An execution algorithm 
is directly expressed by (Equation|4). 

3.4 Time-ordered product expansion 

An indispensible tool for studying such stochastic processes in physics is the time-ordered product expansion 1 3 1 . We 
use the following form: 



exp(tH) ■ po = exp(i (H + Hi)) ■ po 

OO 

= E 



71=0 



/ dti dt 2 --- dt n exp((t-t n )H )Hiexp((t n -t n - 1 )H )---H 1 exp(t 1 H ) 

JO Jti Jtn-l 



■Po (10) 



where Hq is a solvable or easily computable part of H, so the exponentials exp(tiJo) can be computed or sampled 
more easily than exp(iiJ). This expression can be used to generate Feynman diagram expansions, in which n denotes 
the number of interaction vertices in a graph representing a multi-object history. If we apply (Equation I lOi with 
H\ = H and Hq = —D, we derive the well-known Gillespie algorithm for simulating chemical reaction networks |4 1, 
which can now be applied to SPG's. However many other decompositions of H are possible, one of which is used 
in Section R~31 below. Because the operators H can be decomposed in many ways, there are many valid simulation 
algorithms for each stochastic process. The particular formulation of the time-ordered product expansion used in 
(EauationllO> has the advantage of being recursively self-applicable. 

Thus, (EquationllO> entails a systematic approach to the creation of novel simulation algorithms. 



3.5 Relation between semantic maps 

Proposition. Given the stochastic parameterized grammar (SPG) rule syntax of Equation|3] 

(a) There is a semantic function fy c mapping from any continuous-time, context sensitive, stochastic parameterized 
grammar T via a time evolution operator H(H(T)) to a joint probability density function on the parameter values and 
birth/death times of grammar terms, conditioned on the total elapsed time, t. 

(b) There is a semantic function ^ mapping any discrete-time, sequential-firing, context sensitive, stochastic param- 
eterized grammar T via a time evolution operator H(T) to a joint probability density function on the parameter values 
and birth/death times of grammar terms, conditioned on the total discrete time defined as number of rule firings, s. 

(c) The short-time limit of the density ^ c (r) conditioned on t — > and conditioned on s is equal to ^(r). 
Proof: (a): SectionEJl (b): Section|33] (c) Equation[lO](details in J8), □). 



3.6 Discussion: Transformations of SPG's 



Given a new kind of mathematical object (here, SPG's or DG's) it is generally productive in mathematics to consider 
the transformations of such objects (mappings from one object to another or to itself) that preserve key properties. Ex- 
amples include transformational geometry (groups acting on lines and points) and functors acting on categories. In the 



case of SPG's, two possibilities for the preserved property are immediately salient. First, an SPG syntactic transfor- 
mation r — > T' could preserve the semantics \&(r) = ^>(T') either fully or just in fixed point form: ^*(T) = <3>*(T'). 
Preserving the full semantics would be required of a simulation algorithm. Alternatively, an inference algorithm could 
preserve a joint probability distribution on unobserved and observed random variables, in the form of Bayes' rule, 

Prr(out,internal\in)Pr(in) = Pr(in, internal, out) = Pri n f eTence (in,internal\out)Pr(out) 

where (in, internal, out) are collections of parameterized terms that are inpuuts to, internal to, and outputs from the 
grammar T respectively.. 

4 Examples and Reductions 

A number of other frameworks and formalisms can be expressed or reduced to SPGs as just defined. For example, 
data clustering models are easily and flexibly described 1 1 1. We give a sampling here. 



4.1 Biochemical reaction networks 

Given the chemical reaction network syntax 



{™ a r) A a \ 



1 < a < Ar, 



(11) 



and likewise for b(j) as a function of 



A max c-l 

define an index mapping a(i) = ^ cO(^ TnJ < * ^ S 

c=l d=l d=l 

Then (Equation II 1> can be translated to the following equivalent grammar syntax for the multisets of parameterless 
terms 



T a (i)\0 < i < ^2 



,(r) 



Ta'(j)\0 <J^J2 



,(r) 



with k, 



(r) 



c=l 



whose semantics is the time-evolution generator 



O r = fe( r ) 


n 




II a Ki) 


(12) 




zGrhs(r) 




j Glhs(r) 





This generator is equivalent to the stochastic process model of mass-action kinetics for the chemical reaction network 
(EauationlTTV 



4.2 Logic programs 

Consider a logic program (e.g. in pure Prolog) consisting of Horn clauses of positive literals 

pi A ... Ap n =>■ q,n ^ 0. 

Axioms have n = 0. We can translate each such clause into a monotonic SPG rule 

■px, ...,p n -> q,pi, ...,p n 



(13) 



where each different literal p,or q denotes an unparameterized type r a with n a G {0, ...n™ ax } — {0, 1} . Since there 
is no with clause, the fule firing rates default to p = 1. The corresponding time-evolution operator is 



n 

f'£rhs(r)\lhs(r) 



II N m 



(14) 



The semantics of the logic program is its least model or minimal interpretation. It can be computed (Knaster-Tarski 
theorem) by starting with no literals in the "pool" and repeatedly drawing all their consequences according to the logic 
program. This is equivalent to converging to a fixed point "J* (r) • |0) of the grammar consisting of rules of (Equation 

ED. 



More general clauses include negative literals -rr on the LHS, as p\ A ...p n A -in A . 



q, or even more general 



cardinality constraint atoms ^ I ^ \Z\ = J2ieA®(Pi) ^ u ^ oo \5\. These constraints can be expressed 
operator algebra by expanding the basis operator set {B a } beyond the basic creation and annihilation operators (Q. 
Finally, atoms with function symbols may be admitted using parameterized terms T a (x). 



4.3 Graph grammars 



Graph grammars are composed of local rewrite rules for graphs (see for example |6|). We now express a class 
of graph grammars in terms of SPG's. The following syntax introduces Object Identifier (OID) labels Li for each 
parameterized term, and allows labelled terms to point to one another through a graph of such labels . The graph is 
related to two subgraphs of neighborhood indices N(i, a) and N'(j, a) specific to the input and output sides of a rule. 
Like types or variables, the label symbols appearing in a rule are chosen from an alphabet {£ A |A G A}. Unlike types 
but like variables X c , the label symbols £ A (i) actually denote nonnegative integer values - unique addresses or object 
identifiers. 

A graph grammar rule is of the form, for some nonnegative-integer-valued functions X(i) , X'(j), N(i, a), N'(j, a) 
for which (A(i) = A(j)) (i = j), (A'(i) = A'(j)) => (t = j): 

[L A(i) := Ti(x a{i) ; (Ls^W e e 1 }^ G Zi C J} 

U := r 3 {x' a , {j) - (L NUa) \a e G j} with M{<' (j )} I {*.«)}) (15) 

(compare to (Equation|2} ). Note that the fanout of the graph is limited by of ur ^ "^if- Let Ziand22 be mutually 
exclusive and exhaustive, and the same for J7iandj2. Define J\ = {j G J A (3i G 22|A(i) = A'(j)}, J72 = {j G 
J A($i E J 2 |A(i) = A'(j)}, and I 3 = {ieI 2 A $j G Ji|A(i) = A'(j)} C X 2 ). Then the graph syntax may be 
translated to the following ordinary non-graph grammar rule (where NextOID is a variable, and OIDGen and Null are 
types reserved for the translation): 

{T a{ i)(L x{i) ,x a{i) , {L N(it(T) \a G l..of"))|t G 1} , OIDGen(NextOID) 

U {r Q , -)(L A '(i), (^JV' (j -, CT )k G l-a-f r ))|j G Ji A (t G T 2 ) A (A(i) = A' (.?))} 

U {v( 3 )(Ia'( 3 ),<'( 3 ), (ijV'y,«r)k 6 l-of))^" G J2} 
U {Null(L A(i) )|i G 2g} U {OIDGen(NextOID + \J\)} 

with Pr ({ x ' a , (j) }\{x a(i) }) ]J 6 K (L x >u),NextOm + j-l) 



which aheady has a defined semantics ^ c /d- Note that all set membership tests can be done at translation time because 
they do not use information that is only available dynamically during the grammar evolution. Optionally we may also 
add a rule schema (one rule per type, r ) to eliminate any dangling pointers 

Strings may be encoded as one-dimensional graphs using either a singly or doubly linked list data structure. String 
rewrite rules are emulated as graph rewrite rules, whose semantics are defined above. This form is capable of handling 
many L-system grammars |7 1. 



4.4 Stochastic and ordinary differential equations 

There are SPG rule forms corresponding to stochastic differential equations governing diffusion and transport. Given 
the SDE or equivalent Langevin equation (which specializes to a system of ordinary differential equations when 

»/(*)= 0): 

dxi = Vi({x k })dt + <j({x k })dW or (16) 
^- = v i ({x k })+r h (t) (17) 

under some conditions on the noise term rj(t) the dynamics can be expressed |3| as a Fokker-Planck equation for the 
probability distribution P({x}, t): 

dP[{ ^ t] = - £ l- Vl ({x})P({x} , t) + £ J*- Dij ({x})P({x} , i) (18) 



Let P({y}, t\ {x}, 0) be the solution of this equation given initial condition P({y}, 0) = S({y} — {x}) = Y[^(Vk~ x k) 

k 

(with Dirac delta function appropriate to the particular measure /i used for each component). Then at t = 0, 

dP({y},0\ {x},0) ^ p({yi} ! { ^ }) = _ ^ d_^ {{x]my} _ {x]) + J2 J?L- Diji{x}my} - {x}) 

Thus the probability rate p({yi}\{xi}) is given by a differential operator acting on a Dirac delta function. By (Equation 
[SJi we construct the evolution generator operators Opp = Odrift + OdifFusion, where 

O dT m = ~J d{x} J d{y}a({y})a({x}) V w v<({y}) JJ % fe - x k )^j 
Odiffusion = y y d{j/}a({y})a({x}) I ^ V^V^-Dy^y}) JJ - x k ) 

\ ij k 

The second order derivative terms give diffusion dynamics and also regularize and promote continuity of probability 
in parameter space both along and transverse to any local drift direction. Calculations with such expressions are shown 
inffl. 

Diffusion/drift rules can be combined with chemical reaction rules to describe reaction-diffusion systems |2|. The 
foregoing approach can be generalized to encompass partial differential equations and stochastic partial differential 
equations fTl. 

These operator expressions all correspond to natural extended-time processes given by the evolution of continuous 
differential equations. The operator semantics of the differential equations is given in terms of derivatives of delta 
functions. A special "solve" or "solving" keyword may be used to introduce such ODE/SDE rule clauses in the 
SPG syntax. This syntax can be eliminated in favor of a "with" clause by using derivatives of delta functions in 
the rate expression PDE{{yi}\{xi}), provided that such generalized functions are in the Banach space J-(V) as a 
limit of functions. If a grammar includes such DE rules along with non-DE rules, a solver can be used to compute 
exp((t„+i — t n )0-pp) in the time-ordered product for cxp(tH) as a hybrid simulation algorithm for discontinuous 
(jump) stochastic processes combined with stochastic differential equations. 



4.5 Discussion: Relevance to artificial intelligence and computational science 

The relevance of the modeling language defined here to artificial intelligence includes the following points. First, 
pattern recognition and machine learning both benefit foundationally from better, more descriptively adequate prob- 
abilistic domain models. As an example, Q] exhibits hierarchical clustering data models expressed very simply in 
terms of SPG's and relates them to recent work. Graphical models are probabilistic domain models with a fixed 
structure of variables and their relationships, by contrast with the inherently flexible variable sets and dependency 
structures resulting from the execution of stochastic parameterized grammars. Thus SPG's, unlike graphical models, 
are Variable-Structure Systems (defined in 1 8 1), and consequently they can support compositional description of com- 
plex situations such as multiple object tracking in the presence of cell division in biological imagery |9|. Second, the 
reduction of many divergent styles of model to a common SPG syntax and operator algebra semantics enables new 
possibilities for hybrid model forms. For example one could combine logic programming with probability distribution 
models, or discrete-event stochastic and differential equation models as discussed in Section B^l in possibly new ways. 

As a third point of AI relevance, from SPG probabilistic domain models it is possible to derive algorithms for simula- 
tion (as in Section l3T4l and inference either by hand or automatically. Of course, inference algorithms are not as well 
worked out yet for SPG's as for graphical models. SPG's have the advantage that simulation or inference algorithms 
could be expressed again in the form of SPG's, a possibility demonstrated in part by the encoding of logic programs as 
SPG's. Since both model and algorithm are expressed as SPG's, it is possible to use SPG transformations that preserve 
relevant quantities (Section f3.6> as a technique for deriving such novel algorithms or generating them automatically. 
For example we have taken this approach to rederive by hand the Gillespie simulation algorithm for chemical kinetics. 
This derivation is different from the one in Section l34l Because SPG's encompass graph grammars it is even possible 
in principle to express families of valid SPG transformations as meta-SPG's. All of these points apply a fortiori to 
Dynamical Grammars as well. 

The relevance of the modeling language defined here to computational science includes the following points. 
First, as argued previously, multiscale models must encompass and unify heterogeneous model types such as dis- 



crete/continuous or stochastic/deterministic dynamical models; this unification is provided by SPG's and DG's. Sec- 
ond, a representationally adequate computerized modeling language can be of great assistance in constructing mathe- 
matical models in science, as demonstrated for biological regulatory network models by Cellerator 1 10 1 and other cell 
modeling languages. DG's extend this promise to more complex, spatiotemporally dynamic, variable-structure system 
models such as occur in biological development. Third, machine learning techniques could in principle be applied to 
find simplified approximate or reduced models of emergent phenomena within complex domain models. In that case 
the forgoing AI arguments apply to computational science applications of machine learning as well. 

Both for artificial intelligence and computational science, future work will be required to determine whether the 
prospects outlined above are both realizable and compelling. The present work is intended to provide a mathematical 
foundation for achieving that goal. 

5 Conclusion 

We have established a syntax and semantics for a probabilistic modeling language based on independent processes 
leading to events linked by a shared set of objects. The semantics is based on a polynomial ring of time-evolution 
operators. The syntax is in the form of a set of rewrite rules. Stochastic Parameterized Grammars expressed in 
this language can compactly encode disparate models: generative cluster data models, biochemical networks, logic 
programs, graph grammars, string rewrite grammars, and stochastic differential equations among other others. The 
time-ordered product expansion connects this framework to powerful methods from quantum field theory and operator 
algebra. 
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